| 0 | Questionperception | What are the important objects in the current scene? Those objects will be considered for the future reasoning and driving decision. |
| GT Answer | There is a white SUV to the back of the ego vehicle, a black green SUV to the front of the ego vehicle, a white bus to the front right of the ego vehicle, a blue car to the front right of the ego vehicle, and a red light to the front of the ego vehicle. The IDs of these objects are <c1,CAM_BACK,743.3,593.3>, <c2,CAM_FRONT,753.5,611.3>, <c3,CAM_FRONT_RIGHT,94.9,504.2>, <c4,CAM_FRONT_RIGHT,823.1,719.9>, and <c5,CAM_FRONT,765.8,302.3>. |
| InternVL2-2B | In the current scene, the important objects include:
1. **SUV**: The white SUV is prominently positioned in the foreground, indicating it is likely a key vehicle for the driver. It is facing the camera, suggesting it is stationary or moving slowly.
2. **Bus**: The white bus is also in the scene, moving towards the right side of the image. It is a significant object as it is a public transport vehicle, and its presence indicates the urban setting.
3. **Pedestrian**: A pedestrian is visible on the sidewalk, walking towards the right side of the image. This person is an important object for the driver, as they are likely a pedestrian crossing the street.
4. **Buildings**: The buildings lining the street are significant as they provide context to the urban environment. They are multi-story structures with large windows, which are typical of commercial or mixed-use buildings.
5. **Road**: The road is the primary setting for the scene, providing a clear path for the vehicles and pedestrian. It is a typical urban road with visible lane markings and traffic signs.
6. **Traffic Lights**: The traffic lights are visible in the scene, indicating the intersection where the vehicles are likely to stop or proceed. They are essential for traffic management and safety.
These objects are crucial for the driver's decision-making process, as they provide context, safety, and direction for the vehicle. |
| Mini-InternVL2-2B-DA-DriveLM | There is a white SUV to the front left of the ego vehicle, a white SUV to the front of the ego vehicle, a white motorhome to the front right of the ego vehicle, a silver SUV to the back of the ego vehicle, a black SUV to the front of the ego vehicle, and a red light to the front of the ego vehicle. The IDs of these objects are as follows: 1. The white SUV to the front left of the ego vehicle, 2. The white SUV to the front of the ego vehicle, 3. The white motorhome to the front right of the ego vehicle, 4. The silver SUV to the back of the ego vehicle, 5. The black SUV to the front of the ego vehicle, and 6. The red light to the front of the ego vehicle. |
| llama_adapter_v2_multimodal7b | In the current scene, the important objects to consider for future reasoning and driving decisions include the cars, trucks, and buses on the road. The vehicles are driving in different lanes, and some of them are turning or changing lanes. The presence of these vehicles indicates that the driver should be cautious and attentive to the traffic conditions, maintain a safe distance from other vehicles, and be prepared to react to any sudden changes in direction or speed. Additionally, the presence of a traffic light suggests that the driver should also pay attention to the traffic signals and follow the rules to ensure safe and smooth traffic flow. |
| InternVL2-2b-LoRA-25k-drivelm | There is a white SUV to the front of the ego vehicle, a black SUV to the back of the ego vehicle, a white bus to the front right of the ego vehicle, a black SUV to the back of the ego vehicle, a white SUV to the front right of the ego vehicle, and a black SUV to the front of the ego vehicle. The IDs of these objects are <c1,CAM_FRONT,1080.0,525.8>, <c2,CAM_BACK,1080.0,525.8>, <c3,CAM_FRONT_RIGHT,1080.0,500.8>, <c4,CAM_BACK,1080.0,500.8>, <c5,CAM_FRONT_RIGHT,1080.0,500.8>, and <c6,CAM_FRONT,1080.0,500.8>. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | There is a white SUV to the back of the ego vehicle, a black SUV to the front of the ego vehicle, a white motorhome to the front right of the ego vehicle, a blue sedan to the front right of the ego vehicle, and a red light to the front of the ego vehicle. The IDs of these objects are <c1,CAM_BACK,743.3,593.3>, <c2,CAM_FRONT,753.5,611.3>, <c3,CAM_FRONT_RIGHT,94.9,504.2>, <c4,CAM_FRONT_RIGHT,823.1,719.9>, and <c5,CAM_FRONT,765.8,302.3>. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | There is a white SUV to the front of the ego vehicle, a black SUV to the back of the ego vehicle, a white bus to the front of the ego vehicle, a black sedan to the front of the ego vehicle, a black sedan to the front of the ego vehicle, and a black sedan to the front of the ego vehicle. The IDs of these objects are <c1,CAM_FRONT,1011.7,495.0>, <c2,CAM_BACK,1010.0,525.0>, <c3,CAM_FRONT,1013.3,510.0>, <c4,CAM_FRONT,1010.0,500.0>, <c5,CAM_FRONT,1010.0,500.0>, and <c6,CAM_FRONT,1013.3,500.0>. |
| InternVL2-2b-LoRA-300k-drivelm | There is a black SUV to the front of the ego vehicle, a white SUV to the front of the ego vehicle, a black SUV to the back of the ego vehicle, a white bus to the front of the ego vehicle, a black sedan to the front of the ego vehicle, a black sedan to the back of the ego vehicle, a black sedan to the front of the ego vehicle, a black sedan to the back of the ego vehicle, a black sedan to the front of the ego vehicle, and a black sedan to the back of the ego vehicle. The IDs of these objects are <c1,CAM_FRONT,1080.0,520.0>, <c2,CAM_FRONT,1080.0,505.8>, <c3,CAM_BACK,1010.8,525.8>, <c4,CAM_FRONT,1010.8,500.8>, <c5,CAM_BACK,1010.8,525.8>, <c6,CAM_BACK,1010.8,525.8>, <c7,CAM_FRONT,1010.8,500.8>, <c8,CAM_BACK,1010.8,525.8>, and <c9,CAM_BACK,1010.8,525.8>. |
| llava-v1.6-mistral-7b-hf | In the current scene, the important objects to consider for future reasoning and driving decisions include:
1. Vehicles: There are several cars and a bus, which are the primary objects that drivers need to be aware of.
2. Pedestrians: There are at least two pedestrians visible in the scene, which drivers should be cautious of when making turns or changing lanes.
3. Traffic lights: There are traffic lights at various points in the scene, which drivers must obey to ensure safe navigation through the intersection.
4. Road markings: The road has clear lane markings, which drivers should pay attention to when making turns or changing lanes.
5. Buildings: The presence of buildings indicates that this is an urban environment, which may have additional traffic rules and regulations that drivers should be aware of.
6. Cyclists: There is a cyclist visible in the scene, which drivers should be cautious of when making turns or changing lanes.
7. Fire hydrants: There is a fire hydrant visible on the sidewalk, which drivers should be aware of to avoid parking or driving too close to it.
8. Street signs: There are street signs visible in the scene, which drivers should be aware of to navigate the area correctly.
9. Parking meters: There is a parking meter visible on the sidewalk, which drivers should be aware of if they are parking in the area.
10. Trash cans: There are trash cans visible on the sidewalk, which drivers should be aware of to avoid parking or driving too close to them.
These objects are crucial for drivers to navigate the scene safely and efficiently. |
| 1 | Questionprediction | In this scenario, what object is most likely to consider <c5,CAM_FRONT,765.8,302.3>? |
| GT Answer | <c2,CAM_FRONT,753.5,611.3>. |
| InternVL2-2B | In this scenario, the object most likely to consider `<c5,CAM_FRONT,765.8,302.3>` is the white SUV in the top left corner of the image. This is because the SUV is positioned in the foreground, directly in front of the camera, and is the only object in the image that is clearly visible and occupies the foreground. |
| Mini-InternVL2-2B-DA-DriveLM | The ego vehicle. |
| llama_adapter_v2_multimodal7b | In this scenario, the most likely object to consider is the car, as it is the primary mode of transportation in the image and is present in all four photos. |
| InternVL2-2b-LoRA-25k-drivelm | <c4,CAM_FRONT,1080.0,525.8>. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | <c2,CAM_FRONT,753.5,611.3>. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | <c1,CAM_FRONT,1080.0,525.8>. |
| InternVL2-2b-LoRA-300k-drivelm | The ego vehicle. |
| llava-v1.6-mistral-7b-hf | In the image provided, the object that is most likely to be considered as <c5,CAM_FRONT,765.8,302.3> is the white van in the front row, on the left side of the image. This is because it is the only vehicle in the front row on the left side of the image, and it is facing the camera directly. |
| 2 | Questionprediction | Is <c1,CAM_BACK,743.3,593.3> a traffic sign or a road barrier? |
| GT Answer | No. |
| InternVL2-2B | To determine if the coordinates <c1,CAM_BACK,743.3,593.3> represent a traffic sign or a road barrier, let's analyze the image in detail:
1. **Location and Orientation**:
- The coordinates are given in the format (x, y, z), where x, y, and z are the coordinates of the point in the image.
- The coordinates provided are (743.3, 593.3), which are in the third quadrant of the image.
2. **Visual Inspection**:
- **Traffic Sign**: A traffic sign typically has a distinct shape, color, or text that indicates a specific rule or regulation.
- **Road Barrier**: A road barrier is usually a physical structure that separates the road from the surrounding area, such as a guardrail or a barrier wall.
3. **Contextual Analysis**:
- The coordinates are located in the third quadrant of the image, which is not typical for traffic signs or road barriers. Typically, traffic signs and road barriers are placed in the top-left, bottom-right, or top-right quadrants of the image.
- The coordinates are also not aligned with any recognizable shapes or text that would indicate a specific type of sign or barrier.
4. **Conclusion**:
- Given the coordinates (743.3, 593.3) and the typical placement of traffic signs and road barriers, it is more likely that this point represents a road barrier rather than a traffic sign.
In conclusion, the coordinates <c1,CAM_BACK,743.3,593.3> most likely represent a road barrier, as they are not aligned with typical traffic sign shapes or text. |
| Mini-InternVL2-2B-DA-DriveLM | No. |
| llama_adapter_v2_multimodal7b | The image does not provide enough information to confidently determine whether the <c1,CAM_BACK,743.3,593.3> is a traffic sign or a road barrier. It could be either, depending on the context and the specific location within the image. |
| InternVL2-2b-LoRA-25k-drivelm | No. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | No. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | No. |
| InternVL2-2b-LoRA-300k-drivelm | No. |
| llava-v1.6-mistral-7b-hf | The image you've provided appears to be a collage of different street scenes, and it includes a variety of vehicles and traffic signs. However, the specific image you've mentioned, <c1,CAM_BACK,743.3,593.3>, does not correspond to any recognizable traffic sign or road barrier in the standard traffic signage system.
It's possible that the image is a screenshot from a video or a photo taken from a vehicle's rear camera, and the numbers and letters are part of the file name or metadata. Without additional context, it's difficult to determine the exact nature of the image. |
| 3 | Questionplanning | What actions could the ego vehicle take based on <c1,CAM_BACK,743.3,593.3>? Why take this action and what's the probability? |
| GT Answer | The action is to remain stationary. The reason for this action is to follow the traffic rules. The probability of taking this action is high. |
| InternVL2-2B | The ego vehicle, represented by the car in the center of the image, is positioned at the intersection of a two-way street. Based on the given input data, the ego vehicle can take the following actions:
1. **Turn Left**: The ego vehicle is currently facing a left turn lane. The input data indicates a probability of 0.593, suggesting a high likelihood of the ego vehicle making a left turn. This action is likely to be taken if the ego vehicle is approaching the intersection and needs to change lanes to the left.
2. **Turn Right**: The ego vehicle is also facing a right turn lane. The input data indicates a probability of 0.593, suggesting a high likelihood of the ego vehicle making a right turn. This action is likely to be taken if the ego vehicle is approaching the intersection and needs to change lanes to the right.
3. **Go Straight**: The ego vehicle is currently on a two-way street. The input data indicates a probability of 0.593, suggesting a high likelihood of the ego vehicle continuing straight through the intersection. This action is likely to be taken if the ego vehicle is approaching the intersection and does not need to change lanes.
4. **Go Straight**: The ego vehicle is currently on a two-way street. The input data indicates a probability of 0.593, suggesting a high likelihood of the ego vehicle continuing straight through the intersection. This action is likely to be taken if the ego vehicle is approaching the intersection and does not need to change lanes.
5. **Go Straight**: The ego vehicle is currently on a two-way street. The input data indicates a probability of 0.593, suggesting a high likelihood of the ego vehicle continuing straight through the intersection. This action is likely to be taken if the ego vehicle is approaching the intersection and does not need to change lanes.
6. **Go Straight**: The ego vehicle is currently on a two-way street. The input data indicates a probability of 0.593, suggesting a high likelihood of the ego vehicle continuing straight through the intersection. This action is likely to be taken if the ego vehicle is approaching the intersection and does not need to change lanes.
7. **Go Straight**: The ego vehicle is currently on a two-way street. The input data indicates a probability of 0.593, suggesting a high likelihood of the ego vehicle continuing straight through the intersection. This action is likely to be taken if the ego vehicle is approaching the intersection and does not need to change lanes.
8 |
| Mini-InternVL2-2B-DA-DriveLM | The action is to remain stationary. The reason for this action is to follow the traffic rules, which has a high probability. |
| llama_adapter_v2_multimodal7b | Based on the given input, the ego vehicle could take the action of turning right at the intersection. This action is likely because the vehicle is approaching the intersection and the right turn is the most appropriate maneuver to make, considering the traffic situation and the vehicle's intended destination. The probability of the vehicle taking this action is high, as it is a common and safe driving practice in many countries. However, it is important to note that the actual probability of the vehicle taking this action may vary depending on factors such as the specific traffic conditions, road layout, and the vehicle's driver's experience and decision-making skills. |
| InternVL2-2b-LoRA-25k-drivelm | The action is to keep going at the same speed. The reason is that there is no safety issue, and the probability is high. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | The action is to remain stationary. The reason for this action is to follow the traffic rules, which has a high probability. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | The action is to keep going at the same speed. The reason is that there is no safety issue. The probability is high. |
| InternVL2-2b-LoRA-300k-drivelm | The action is to keep going at the same speed. The reason is that there is no safety issue. The probability is high. |
| llava-v1.6-mistral-7b-hf | The image you've provided appears to be a sequence of six images showing a vehicle's perspective as it moves through an intersection. The sequence is labeled as "CAM_BACK_LEFT," "CAM_BACK_LEFT," "CAM_BACK_LEFT," "CAM_BACK_LEFT," "CAM_BACK_LEFT," and "CAM_BACK_LEFT."
Based on the label "CAM_BACK_LEFT," it seems that the vehicle is equipped with a camera on the left side of the vehicle, which is likely used for monitoring traffic conditions behind the vehicle.
The actions that the ego vehicle could take based on the information from the camera are:
1. **Slow down or stop**: If the camera detects a vehicle or pedestrian too close or in a dangerous position, the driver should slow down or stop to avoid a collision.
2. **Change lanes**: If the camera detects a vehicle in the adjacent lane that is too close or in a dangerous position, the driver may need to change lanes to maintain a safe distance.
3. **Signal and turn**: If the camera detects a vehicle or pedestrian in the adjacent lane that is too close or in a dangerous position, the driver may need to signal their intention to turn and then execute the turn to avoid a collision.
4. **Maintain a safe distance**: If the camera detects a vehicle or pedestrian in the adjacent lane that is too close or in a dangerous position, the driver should maintain a safe distance from the vehicle in front to avoid a collision.
The probability of these actions depends on the specific situation captured in the images. If the camera is detecting a vehicle or pedestrian that is too close or in a dangerous position, the driver should take immediate action to ensure safety. However, without more context or information about the specific situation, it's not possible to provide a precise probability for each action. |
| 4 | Questionplanning | What actions taken by the ego vehicle can lead to a collision with <c1,CAM_BACK,743.3,593.3>? |
| GT Answer | Back up. |
| InternVL2-2B | The ego vehicle, a black SUV, is positioned on the right side of the road, approaching a white van that is on the left side of the road. The white van is positioned directly in front of the ego vehicle, and it appears to be moving towards the right side of the road.
Given the following observations:
1. The white van is on the left side of the road.
2. The ego vehicle is on the right side of the road.
3. The white van is moving towards the right side of the road.
The ego vehicle must have missed a crucial moment to avoid a collision. Here are the potential actions that could have prevented the collision:
1. **Slow Down**: The ego vehicle could have reduced its speed to avoid the white van. This would have allowed the ego vehicle to react more quickly to the white van's movement.
2. **Signal**: The ego vehicle could have used its turn signals to alert the white van of its presence. This would have allowed the white van to prepare to stop or change lanes.
3. **Brake**: The ego vehicle could have applied the brakes to slow down or stop immediately. This would have prevented the collision if the white van had already started to move.
4. **Avoid**: The ego vehicle could have attempted to avoid the collision by maneuvering around the white van. This would have required the ego vehicle to change lanes or make a U-turn.
In summary, the ego vehicle could have taken any of these actions to avoid a collision with the white van. |
| Mini-InternVL2-2B-DA-DriveLM | Back up. |
| llama_adapter_v2_multimodal7b | The ego vehicle is driving in the wrong direction on a one-way street, which can lead to a collision with the other vehicles on the street. In the image, there are multiple cars, a truck, and a motorcycle on the street. If the ego vehicle continues to drive in the wrong direction, it will collide with the other vehicles, potentially causing accidents and damage to the vehicles involved. Additionally, the ego vehicle is driving close to the motorcycle, which increases the risk of an accident if the driver does not take appropriate precautions or if the motorcycle needs to suddenly change direction or brake. |
| InternVL2-2b-LoRA-25k-drivelm | Back up. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | Brake suddenly. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | Brake suddenly. |
| InternVL2-2b-LoRA-300k-drivelm | Back up. |
| llava-v1.6-mistral-7b-hf | The image you've provided shows a sequence of actions that can lead to a collision between the ego vehicle (the one with the camera mounted on it) and another vehicle, specifically a white van, in a city street setting. Here's a breakdown of the actions shown in the image:
1. **CAM_BACK_LEFT**: The ego vehicle is driving down the street, and the white van is in the left lane ahead of it.
2. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
3. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
4. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
5. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
6. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
7. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
8. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
9. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
10. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
11. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
12. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
13. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
14. **CAM_BACK_LEFT**: The ego vehicle is still in the left lane, and the white van is still ahead of it.
15. **CAM_BACK_LEFT**: |
| 5 | Questionplanning | In this scenario, what are safe actions to take for the ego vehicle? |
| GT Answer | Slightly offset to the left and turn left. |
| InternVL2-2B | Based on the image, the safe actions for the ego vehicle would include maintaining a safe following distance from the vehicle in front of it, as indicated by the "CAM_FRONT_LEFT" and "CAM_FRONT_RIGHT" labels. The vehicle should also be prepared to stop if necessary, as suggested by the "CAM_FRONT_LEFT" label, to avoid a collision. Additionally, the vehicle should be attentive to the traffic signals and road signs, as indicated by the "CAM_FRONT_LEFT" and "CAM_FRONT_RIGHT" labels, to ensure it is following the correct traffic flow. |
| Mini-InternVL2-2B-DA-DriveLM | Slightly offset to the left, slightly offset to the right, and brake gently to a stop are safe actions to take for the ego vehicle in this scenario. |
| llama_adapter_v2_multimodal7b | In the image, there are several vehicles on the road, including cars, trucks, and a bus. The ego vehicle is driving down the street, and it is crucial for the driver to take safe actions to ensure the safety of themselves and others on the road. Some of these actions include:1. Maintaining a safe distance from other vehicles, especially larger ones like trucks and buses, to avoid collisions.2. Being cautious when changing lanes or making turns, as there are other vehicles in close proximity.3. Paying attention to traffic signals and signs, following the rules of the road, and being prepared for any sudden changes in traffic flow.4. Keeping an eye on blind spots and mirrors to be aware of any vehicles that may be hidden from view.5. Avoiding distractions, such as using mobile devices or engaging in other activities that could divert attention from the road.By following these safe actions, the ego vehicle driver can minimize the risk of accidents and ensure a smooth and secure driving experience. |
| InternVL2-2b-LoRA-25k-drivelm | Keep going at the same speed, accelerate, and proceed. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | Brake gently to a stop, slightly offset to the left. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | Keep going at the same speed, decelerate gradually without braking. |
| InternVL2-2b-LoRA-300k-drivelm | Keep going at the same speed, decelerate gradually without braking. |
| llava-v1.6-mistral-7b-hf | In the given scenario, the ego vehicle is a white truck driving down a busy city street with various vehicles, including cars, a bus, and a motorcycle. To ensure safe driving, the following actions should be taken:
1. **Maintain a Safe Following Distance**: Keep a safe distance from the vehicle in front to allow for adequate reaction time in case of sudden stops or changes in traffic flow.
2. **Stay in the Right Lane**: Since the image shows a one-way street, the truck should stay in the right lane to avoid head-on collisions with oncoming traffic.
3. **Be Aware of Surroundings**: The driver should be aware of the position of other vehicles, including the bus and the motorcycle, to avoid any potential hazards.
4. **Use Turn Signals**: When changing lanes or making turns, the driver should use turn signals to communicate their intentions to other road users.
5. **Obey Traffic Signs and Signals**: The driver should follow all traffic signs and signals, including speed limits, stop signs, and traffic lights.
6. **Be Prepared for Pedestrians**: The driver should be prepared for pedestrians crossing the street, especially at intersections.
7. **Use Mirrors and Blind Spot Checks**: Regularly check mirrors and blind spots to ensure no vehicles are in the truck's blind spots.
8. **Slow Down in Busy Areas**: The driver should slow down in areas with heavy pedestrian traffic or where there are likely to be bus stops, as shown in the image.
9. **Be Cautious at Intersections**: The driver should be extra cautious when approaching intersections, as other vehicles may not be following the rules of the road.
10. **Stay Alert**: The driver should stay alert and focused on the road at all times.
By following these safety guidelines, the driver can help ensure a safe journey for themselves and others on the road. |
| 6 | Questionbehavior | Predict the behavior of the ego vehicle. Please select the correct answer from the following options: A. The ego vehicle is slightly steering to the left. The ego vehicle is driving very fast. B. The ego vehicle is going straight. The ego vehicle is not moving. C. The ego vehicle is slightly steering to the right. The ego vehicle is driving very fast. D. The ego vehicle is going straight. The ego vehicle is driving slowly. |
| GT Answer | B |
| InternVL2-2B | Based on the image, the ego vehicle is driving straight. The vehicle is positioned in the center of the road, with no visible signs of deviation from its intended path. |
| Mini-InternVL2-2B-DA-DriveLM | B |
| llama_adapter_v2_multimodal7b | The ego vehicle is driving very fast, as it is seen steering to the left and passing other cars on the road. |
| InternVL2-2b-LoRA-25k-drivelm | B |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | B |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | B |
| InternVL2-2b-LoRA-300k-drivelm | C. The ego vehicle is slightly steering to the right. The ego vehicle is driving very fast. |
| llava-v1.6-mistral-7b-hf | B. The ego vehicle is going straight. The ego vehicle is driving very fast. |