| 0 | Questionperception | Identify all the traffic elements in the front view, categorize them, determine their status, and predict the bounding box around each one. The output should be a list formatted as (c, s, x1, y1, x2, y2), where c represents the category, s denotes the status, and x1, y1, x2, y2 are the offsets of the top-left and bottom-right corners of the box relative to the center point. |
| GT Answer | There are many traffic elements in the front view. The information of these traffic elements is [(traffic light, unknown, 1565.73, 158.82, 1600.0, 234.23), (traffic light, unknown, 1418.3, 211.19, 1450.94, 280.43), (traffic light, unknown, 1234.88, 239.97, 1290.78, 303.59), (traffic light, unknown, 927.82, 256.62, 950.71, 306.49), (traffic light, unknown, 892.7, 233.39, 916.41, 286.53), (traffic light, green, 830.99, 203.87, 854.29, 280.31), (traffic light, green, 831.41, 203.83, 854.65, 280.98), (traffic light, green, 927.22, 342.2, 953.3, 399.3), (traffic light, green, 927.52, 342.18, 952.52, 398.93), (traffic light, red, 967.57, 410.36, 982.4, 421.42)]. |
| InternVL2-2B | To identify and categorize the traffic elements in the front view, we'll break down the image into distinct categories and provide the bounding boxes for each:
1. **White SUV (CAM_FRONT_LEFT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
2. **White SUV (CAM_FRONT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
3. **White SUV (CAM_FRONT_RIGHT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
4. **White Truck (CAM_FRONT_RIGHT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
5. **White Car (CAM_FRONT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
6. **White Car (CAM_FRONT_LEFT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
7. **White Car (CAM_FRONT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
8. **White Car (CAM_FRONT_RIGHT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
9. **White Car (CAM_FRONT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
10. **White Car (CAM_FRONT_LEFT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
11. **White Car (CAM_FRONT_RIGHT)**:
- Category: Vehicle
- Status: In motion
- Bounding Box: (0, 0, 0, 0)
12. **White Car (CAM_FRONT)**:
- Category: Vehicle
- Status: In motion |
| Mini-InternVL2-2B-DA-DriveLM | There are many traffic elements in the front view. The information of these traffic elements is [(traffic light[[506, 113, 511, 150]], green), (traffic light[[536, 113, 541, 150]], green), (traffic light[[557, 115, 563, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110, 622, 150]], green), (traffic light[[613, 110, 622, 150]], red), (traffic light[[613, 110 |
| llama_adapter_v2_multimodal7b | The traffic elements in the front view are:
1. Car (c = 1)
2. Truck (c = 2)
3. Bus (c = 3)
4. Traffic light (c = 4)
5. Street sign (c = 5)
6. Building (c = 6)
The status of each element is:
1. Car: moving (s = 1)
2. Truck: moving (s = 1)
3. Bus: moving (s = 1)
4. Traffic light: green (s = 1)
5. Street sign: visible (s = 1)
6. Building: visible (s = 1) |
| InternVL2-2b-LoRA-25k-drivelm | There is one traffic element in the front view, which is a white SUV to the front of the ego vehicle, so the category is "S", the status is "S", the offsets of the top-left corner are (1080, 310), the bottom-right corner is (1080, 480), and the offsets of the top-left corner are (1080, 480), (1080, 310). |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | There are many traffic elements in the front view. The information of these traffic elements is [(traffic light, green, 747.19, 0.58, 779.48, 99.31), (traffic light, green, 1045.42, 75.0, 1073.79, 159.26), (traffic light, unknown, 1045.77, 75.0, 1073.37, 159.26), (traffic light, green, 1544.31, 179.98, 1592.63, 293.31), (traffic light, unknown, 1544.31, 179.98, 1592.63, 293.31), (traffic light, green, 1453.51, 300.83, 1478.7, 383.44), (traffic light, unknown, 1453.51, 300.83, |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | There are two traffic elements in the front view: a car and a truck. The car is a traffic sign, while the truck is a traffic sign. The car is a traffic sign, while the truck is a traffic sign. The bounding boxes are as follows: (0, 0, 1080, 430), (1080, 0, 1200, 430), (1080, 430, 1200, 530), (1080, 530, 1200, 630). |
| InternVL2-2b-LoRA-300k-drivelm | There are many traffic elements in the front view. The information of these traffic elements is [(traffic light, green, 1015.0, 310.0, 1045.0, 340.0), (traffic light, green, 1045.0, 310.0, 1055.0, 340.0), (traffic light, green, 1055.0, 310.0, 1065.0, 340.0), (traffic light, green, 1065.0, 310.0, 1075.0, 340.0), (traffic light, green, 1075.0, 310.0, 1085.0, 340.0), (traffic light, green, 1085.0, 310.0, 1095.0, 340.0), (traffic light, green, 1095.0, 310.0, 1105.0, 340.0), (traffic light, green, 1115.0, 310.0, 1125.0, 340.0), (traffic light, green, 1125.0, 310.0, 1135.0, 340.0), (traffic light, green, 1135.0, 310.0, 1145.0, 340.0), (traffic light, green, 1145.0, 310.0, 1155.0, 340.0), (traffic light, green, 1155.0, 310.0, 1165.0, 340.0), (traffic light, green, 1165.0, 310.0, 1175.0, 340.0), (traffic light, green, 1175.0, 310.0, 1185.0, 340.0), (traffic light, green, 1185.0, 310.0, 1195.0, 340.0), (traffic light, green, 1195.0, 310.0, 1205.0, 340.0), (traffic light, green, 1205.0, 310.0, 1115.0, 340.0), (traffic light, green, 1115.0, 310.0, 11 |
| llava-v1.6-mistral-7b-hf | The image provided is a collage of several different scenes, each depicting a different traffic element. Due to the nature of the collage, it is not possible to provide a single list of all the traffic elements in the front view, as each scene has its own set of elements. However, I can describe the traffic elements in each scene and provide a list for each individual scene.
1. Top Left:
- Category: Car
- Status: Moving
- Bounding Box: (0.156,0.169,0.456,0.487)
2. Top Right:
- Category: Car
- Status: Moving
- Bounding Box: (0.500,0.169,0.734,0.487)
3. Bottom Left:
- Category: Car
- Status: Moving
- Bounding Box: (0.000,0.508,0.208,0.988)
4. Bottom Right:
- Category: Car
- Status: Moving
- Bounding Box: (0.500,0.508,0.734,0.988)
5. Middle Left:
- Category: Car
- Status: Moving
- Bounding Box: (0.156,0.169,0.456,0.487)
6. Middle Right:
- Category: Car
- Status: Moving
- Bounding Box: (0.500,0.169,0.734,0.487)
7. Bottom Left:
- Category: Car
- Status: Moving
- Bounding Box: (0.000,0.508,0.208,0.988)
8. Bottom Right:
- Category: Car
- Status: Moving
- Bounding Box: (0.500,0.508,0.734,0.988)
Please note that |
| 1 | Questionperception | What is the moving status of object <c2,CAM_FRONT_LEFT,1045.0,596.3>? Please select the correct answer from the following options: A. Stopped. B. Drive backward. C. Going ahead. D. Turn right. |
| GT Answer | C |
| InternVL2-2B | The moving status of object <c2,CAM_FRONT_LEFT,1045.0,596.3> is C. Going ahead. |
| Mini-InternVL2-2B-DA-DriveLM | C |
| llama_adapter_v2_multimodal7b | The moving status of object <c2,CAM_FRONT_LEFT,1045.0,596.3> is "Going ahead." |
| InternVL2-2b-LoRA-25k-drivelm | C |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | C |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | C |
| InternVL2-2b-LoRA-300k-drivelm | C. Going ahead. |
| llava-v1.6-mistral-7b-hf | A. Stopped |
| 2 | Questionprediction | What object should the ego vehicle notice first when the ego vehicle is getting to the next possible location? What is the state of the object that is first noticed by the ego vehicle and what action should the ego vehicle take? What object should the ego vehicle notice second when the ego vehicle is getting to the next possible location? What is the state of the object perceived by the ego vehicle as second and what action should the ego vehicle take? What object should the ego vehicle notice third? What is the state of the object perceived by the ego vehicle as third and what action should the ego vehicle take? |
| GT Answer | Firstly, notice that <c4,CAM_FRONT,840.4,239.7>. The object is a traffic sign, so the ego vehicle should keep going ahead at the same speed. Secondly, notice that <c2,CAM_FRONT_LEFT,1045.0,596.3>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. Thirdly, notice that <c3,CAM_BACK_LEFT,352.3,402.8>. The object is stationary, so the ego vehicle should keep going ahead at the same speed. |
| InternVL2-2B | When the ego vehicle is getting to the next possible location, it should first notice the traffic light at the intersection. The state of the traffic light is green, indicating that vehicles can proceed. The ego vehicle should then proceed through the intersection. |
| Mini-InternVL2-2B-DA-DriveLM | Firstly, notice that the object in front of the ego vehicle is a traffic sign. The ego vehicle should continue going ahead at the same speed. Secondly, notice that the object in front of the ego vehicle is stationary. The ego vehicle should continue going ahead at the same speed. Thirdly, notice that the object in front of the ego vehicle is a traffic sign. The ego vehicle should continue going ahead at the same speed. |
| llama_adapter_v2_multimodal7b | When the ego vehicle is getting to the next possible location, it should first notice the truck in front of it. The ego vehicle should then take appropriate action, such as slowing down or changing lanes if necessary, to maintain a safe distance from the truck. Next, the ego vehicle should notice the car on the right side of the road. The ego vehicle should then take appropriate action, such as checking its blind spot or changing lanes if necessary, to ensure safe navigation. Finally, the ego vehicle should notice the traffic light. The ego vehicle should then come to a complete stop if the traffic light is red, and proceed with caution if the traffic light is green. |
| InternVL2-2b-LoRA-25k-drivelm | Firstly, notice that <c3,CAM_FRONT,1080.8,525.8>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. Secondly, notice that <c2,CAM_FRONT,1080.8,505.8>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. Thirdly, notice that <c1,CAM_FRONT,1080.8,505.8>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | Firstly, notice that <c2,CAM_FRONT_LEFT,1045.0,596.3>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. Secondly, notice that <c4,CAM_FRONT,840.4,239.7>. The object is a traffic sign, so the ego vehicle should keep going ahead at the same speed. Thirdly, notice that <c3,CAM_BACK_LEFT,352.3,402.8>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | Firstly, notice <c1, CAM_FRONT, 1200.0, 600.0>. The object is approaching, so the ego should slow down. Secondly, notice <c2, CAM_FRONT, 1200.0, 600.0>. The object is approaching, so the ego should slow down. Thirdly, notice <c3, CAM_FRONT, 1200.0, 600.0>. The object is approaching, so the ego should slow down. |
| InternVL2-2b-LoRA-300k-drivelm | Firstly, notice that <c3,CAM_FRONT,1080.0,520.0>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. Secondly, notice that <c1,CAM_FRONT,1080.0,505.8>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. Thirdly, notice that <c2,CAM_BACK,1010.0,555.8>. The object is going ahead, so the ego vehicle should keep going ahead at the same speed. |
| llava-v1.6-mistral-7b-hf | The image you've provided appears to be a collage of different street scenes, each with various objects that the ego vehicle should notice when approaching the next possible location. Here's a breakdown of what the ego vehicle should notice and what actions it should take:
1. **First Object**: The first object the ego vehicle should notice is the traffic light. The state of the traffic light is red, indicating that the ego vehicle should stop.
2. **Second Object**: The second object the ego vehicle should notice is the pedestrian crossing. The state of the pedestrian crossing is clear, suggesting that the ego vehicle should proceed with caution, checking for pedestrians before moving forward.
3. **Third Object**: The third object the ego vehicle should notice is the truck. The truck is parked, so the ego vehicle should be aware of its presence and proceed with caution, especially if there is limited visibility around the truck.
4. **Fourth Object**: The fourth object the ego vehicle should notice is the potted plant. The plant is on the sidewalk, and the ego vehicle should be aware of it to avoid any potential damage to the plant or the sidewalk.
5. **Fifth Object**: The fifth object the ego vehicle should notice is the building. The building is in the background, and the ego vehicle should be aware of its presence to navigate the street correctly.
6. **Sixth Object**: The sixth object the ego vehicle should notice is the car. The car is parked, and the ego vehicle should be aware of its presence to avoid any potential damage to the car or the sidewalk.
7. **Seventh Object**: The seventh object the ego vehicle should notice is the bench. The bench is on the sidewalk, and the ego vehicle should be aware of its presence to avoid any potential damage to the bench or the sidewalk.
8. **Eighth Object**: The eighth object the ego vehicle should notice is the truck. The truck is parked, and the ego vehicle should be aware of its presence to avoid any potential damage to the truck or the sidewalk.
9. **Ninth Object**: The ninth object the ego vehicle should notice is the potted plant. The plant is on the sidewalk, and the ego vehicle should be aware of its presence to avoid any potential damage to the plant or the sidewalk.
10. **Tenth Object**: The tenth object the ego vehicle should notice is |
| 3 | Questionprediction | Is <c1,CAM_BACK,920.0,645.0> a traffic sign or a road barrier? |
| GT Answer | No. |
| InternVL2-2B | To determine if the coordinates <c1,CAM_BACK,920.0,645.0> represent a traffic sign or a road barrier, let's analyze the image in detail:
1. **Location and Orientation**:
- The coordinates are given in the format (x, y, z), where x, y, and z are the coordinates of the point in the image.
- The point is located at (920.0, 645.0).
2. **Visual Inspection**:
- **Traffic Sign**: A traffic sign typically has a distinct shape, color, or text that indicates a specific rule or regulation. In this image, there is no such distinct feature.
- **Road Barrier**: A road barrier is typically a physical structure that separates the road from the sidewalk or other areas. In this image, there is no visible barrier.
3. **Contextual Analysis**:
- The coordinates are in the bottom right corner of the image, which is consistent with the typical location of a traffic sign or road barrier.
- The surrounding area does not show any signs of a road barrier or sidewalk, which would typically be present in such coordinates.
4. **Conclusion**:
- Given the coordinates and the visual inspection, it is reasonable to conclude that the coordinates <c1,CAM_BACK,920.0,645.0> represent a road barrier.
In conclusion, the coordinates <c1,CAM_BACK,920.0,645.0> are most likely a road barrier, as there is no visible traffic sign or other typical road barrier features in the image. |
| Mini-InternVL2-2B-DA-DriveLM | No. |
| llama_adapter_v2_multimodal7b | The image does not provide enough information to confidently determine whether the <c1,CAM_BACK,920.0,645.0> is a traffic sign or a road barrier. It could be either, depending on the context and the specific location within the image. |
| InternVL2-2b-LoRA-25k-drivelm | No. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | No. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | No. |
| InternVL2-2b-LoRA-300k-drivelm | No. |
| llava-v1.6-mistral-7b-hf | The image you've provided appears to be a collage of different street scenes, and it includes a traffic sign. The sign in question is a "CAM_BACK" sign, which is typically used to indicate that a camera is monitoring the area for traffic enforcement purposes. It is not a physical barrier but rather a warning to drivers that they are being recorded. |
| 4 | Questionplanning | What actions could the ego vehicle take based on <c1,CAM_BACK,920.0,645.0>? Why take this action and what's the probability? |
| GT Answer | The action is to keep going at the same speed. The reason is that there is no safety issue, and the probability is high. |
| InternVL2-2B | The ego vehicle, represented by the white SUV in the image, is currently positioned at a crosswalk. Based on the given information, the actions it could take are:
1. **Stop**: The vehicle could potentially stop at the crosswalk to ensure the safety of pedestrians and other vehicles.
2. **Wait**: The vehicle could also wait at the crosswalk, possibly to avoid any potential hazards or to wait for a signal to proceed.
3. **Go**: The vehicle could potentially proceed through the crosswalk, although this action is not explicitly indicated by the given information.
The probability of taking this action can be inferred from the context provided by the image. The presence of pedestrians and the fact that the vehicle is at a crosswalk suggest that stopping or waiting is more likely. However, without additional context or information about the surrounding environment, it is not possible to determine the exact probability with certainty. The image alone does not provide enough information to determine the likelihood of proceeding through the crosswalk. |
| Mini-InternVL2-2B-DA-DriveLM | The action is to keep going at the same speed. The reason is that there is no safety issue. The probability is high. |
| llama_adapter_v2_multimodal7b | Based on the given input, the ego vehicle could take the action of turning right, as indicated by the <c1,CAM_BACK,920.0,645.0> command. This action is likely taken to navigate the vehicle around the corner or to change lanes. The probability of the vehicle taking this action is high, as the vehicle is already in motion and approaching the corner, making it necessary to turn right to continue its path. |
| InternVL2-2b-LoRA-25k-drivelm | The action is to keep going at the same speed. The reason is that there is no safety issue, and the probability is high. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | The action is to keep going at the same speed. The reason is that there is no safety issue. The probability is high. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | The action is to keep going at the same speed. The reason is that there is no safety issue. The probability is high. |
| InternVL2-2b-LoRA-300k-drivelm | The action is to keep going at the same speed. The reason is that there is no safety issue. The probability is high. |
| llava-v1.6-mistral-7b-hf | The image you've provided appears to be a sequence of images showing a vehicle's perspective at different points in time, likely from a camera mounted on the back of the vehicle. The sequence is labeled "CAM_BACK," which suggests that the camera is facing backward, capturing the view of the road behind the vehicle.
Based on the label "CAM_BACK,920.0,645.0," it seems to indicate that the vehicle is at a specific location (920.0,645.0) and is facing a certain direction (back).
Given that the vehicle is at a stoplight, the most likely action the ego vehicle could take is to proceed forward when the traffic light turns green. The probability of this action depends on several factors:
1. The traffic light's current state: If the traffic light is currently red, the vehicle would need to wait until it turns green before proceeding. If it's yellow, the driver might choose to proceed with caution, and if it's green, the vehicle would proceed forward.
2. The presence of other vehicles and pedestrians: The driver must be aware of the traffic conditions behind and in front of the vehicle. If there are other vehicles or pedestrians that have the right of way, the driver must yield to them.
3. The driver's decision-making: The driver must consider the traffic conditions, the road ahead, and any potential hazards before proceeding.
4. The vehicle's mechanical condition: The driver must ensure that the vehicle is in good working condition and is safe to operate.
5. Traffic rules and regulations: The driver must follow the local traffic laws and regulations.
The probability of the vehicle proceeding forward when the traffic light turns green would be high, assuming that the driver is following the rules of the road and the vehicle is in good working condition. However, without more context or information about the current state of the traffic light and the surrounding environment, it's impossible to provide a precise probability. |
| 5 | Questionplanning | What actions taken by the ego vehicle can lead to a collision with <c1,CAM_BACK,920.0,645.0>? |
| GT Answer | Brake suddenly. |
| InternVL2-2B | The ego vehicle, a white SUV, is approaching a pedestrian crossing. Here are the actions that could lead to a collision with the vehicle in the image:
1. **Speeding**: The SUV is moving at a high speed, which could cause it to cross the pedestrian crossing in a fraction of a second, potentially leading to a collision.
2. **Lane Departure**: The SUV might attempt to change lanes without yielding to the pedestrian, which could result in a collision if the pedestrian is not prepared to move.
3. **Driving Carelessly**: The SUV could be driving recklessly, such as swerving or making sudden turns, which could lead to a collision with the pedestrian.
4. **Not Stopping**: The SUV might not stop at the pedestrian crossing, failing to yield to the pedestrian, which could result in a collision.
5. **Not Braking**: The SUV might not apply the brakes in time to avoid a collision with the pedestrian, especially if it is not paying attention to the pedestrian's presence.
These actions, if executed, could lead to a collision with the pedestrian crossing, as depicted in the image. |
| Mini-InternVL2-2B-DA-DriveLM | No such action will lead to a collision. |
| llama_adapter_v2_multimodal7b | The ego vehicle is driving down the street, and it is approaching a large truck. The actions taken by the ego vehicle, such as changing lanes, making a sudden turn, or not maintaining a safe distance from the truck, can lead to a collision with the truck. In the image, the ego vehicle is seen changing lanes, which could potentially put it in the path of the truck. To avoid a collision, the ego vehicle should maintain a safe distance from the truck, follow traffic rules, and be aware of the truck's movements and blind spots. |
| InternVL2-2b-LoRA-25k-drivelm | Back up. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | Back up. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | No such action will lead to a collision. |
| InternVL2-2b-LoRA-300k-drivelm | Back up. |
| llava-v1.6-mistral-7b-hf | The image you've provided shows a sequence of actions that can lead to a collision with the vehicle in front, which is identified as <c1,CAM_BACK,920.0,645.0>. Here's a breakdown of the actions shown in the sequence:
1. **CAM_BACK_LEFT**: The ego vehicle is moving to the left, which is a safe maneuver to change lanes or make a turn.
2. **CAM_BACK_RIGHT**: The vehicle continues to move to the right, which is also a safe action to maintain the lane change or turn.
3. **CAM_BACK_LEFT**: The vehicle is now moving back to the left, which is a safe action to return to the original lane or to correct a lane change.
4. **CAM_BACK_RIGHT**: The vehicle is moving back to the right, which is a safe action to return to the original lane or to correct a lane change.
5. **CAM_BACK_LEFT**: The vehicle is moving back to the left, which is a safe action to return to the original lane or to correct a lane change.
6. **CAM_BACK_RIGHT**: The vehicle is moving back to the right, which is a safe action to return to the original lane or to correct a lane change.
7. **CAM_BACK_LEFT**: The vehicle is moving back to the left, which is a safe action to return to the original lane or to correct a lane change.
8. **CAM_BACK_RIGHT**: The vehicle is moving back to the right, which is a safe action to return to the original lane or to correct a lane change.
9. **CAM_BACK_LEFT**: The vehicle is moving back to the left, which is a safe action to return to the original lane or to correct a lane change.
10. **CAM_BACK_RIGHT**: The vehicle is moving back to the right, which is a safe action to return to the original lane or to correct a lane change.
11. **CAM_BACK_LEFT**: The vehicle is moving back to the left, which is a safe action to return to the original lane or to correct a lane change.
12. **CAM_BACK_RIGHT**: The vehicle is moving back to the right, which is a safe action to return to the original lane or to correct a lane |
| 6 | Questionplanning | In this scenario, what are safe actions to take for the ego vehicle? |
| GT Answer | Keep going at the same speed. |
| InternVL2-2B | Based on the image, the safe actions for the ego vehicle would include maintaining a safe following distance from the truck, ensuring that the truck is in the correct lane, and not overtaking the truck. The ego vehicle should also be prepared to stop if necessary, as the truck is in the right lane and the ego vehicle is in the left lane. Additionally, the ego vehicle should be aware of the truck's presence and any potential hazards it may pose, such as the truck's size and the possibility of the truck changing lanes. |
| Mini-InternVL2-2B-DA-DriveLM | Slightly offset to the right, slightly offset to the left, and keep going at the same speed are safe actions to take for the ego vehicle in this scenario. |
| llama_adapter_v2_multimodal7b | In the image, there are multiple vehicles, including a truck and a car, driving down a street with buildings in the background. To ensure safety, the ego vehicle should follow these actions:1. Maintain a safe distance from other vehicles, especially the truck, as it is larger and may have a longer stopping distance.2. Be cautious of the truck's blind spots, as it may not be able to see the ego vehicle in certain areas.3. Keep an eye on the truck's movements, as it may make sudden turns or lane changes.4. Be prepared for the truck to change lanes or make a turn, as it may need more space to maneuver.5. Be aware of the truck's braking distance, as it may take longer to come to a complete stop.6. Keep an eye on the traffic lights and follow traffic rules to avoid accidents and ensure smooth traffic flow.By following these safe actions, the ego vehicle can minimize the risk of accidents and ensure a smooth driving experience. |
| InternVL2-2b-LoRA-25k-drivelm | Keep going at the same speed, accelerate, and proceed. |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | Keep going at the same speed, decelerate gradually without braking. |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | Keep going at the same speed, decelerate gradually without braking. |
| InternVL2-2b-LoRA-300k-drivelm | Keep going at the same speed, decelerate gradually without braking. |
| llava-v1.6-mistral-7b-hf | In the given scenario, the ego vehicle is a white SUV driving on a city street. To ensure safe actions, the driver should:
1. **Maintain a Safe Following Distance**: Keep a safe distance from the vehicle in front to allow for adequate reaction time in case of sudden braking or changes in traffic flow.
2. **Obey Traffic Signs and Signals**: The driver should follow all traffic signs and signals, including speed limits, stop signs, and traffic lights.
3. **Be Aware of Pedestrians**: The driver should be cautious around crosswalks and pedestrian areas, yielding to pedestrians when required.
4. **Watch for Cyclists**: The driver should be vigilant for cyclists, especially when turning or changing lanes.
5. **Use Turn Signals**: When making turns, the driver should use the appropriate turn signals to communicate their intentions to other road users.
6. **Stay in the Lane**: The driver should stay within the marked lanes and avoid drifting into other lanes or onto the sidewalk.
7. **Be Prepared for Weather Conditions**: If the weather is rainy or foggy, as suggested by the images, the driver should reduce speed, increase following distance, and use headlights appropriately.
8. **Be Alert for Other Vehicles**: The driver should be aware of other vehicles on the road, including trucks, which may have blind spots.
9. **Use Mirrors and Blind Spot Checks**: Regularly check mirrors and blind spots to ensure no other vehicles are in the blind spots or too close.
10. **Stay Focused**: The driver should avoid distractions, such as using a mobile phone, and stay focused on the road.
By following these safety guidelines, the driver can navigate the city streets safely and responsibly. |
| 7 | Questionbehavior | Predict the behavior of the ego vehicle. Please select the correct answer from the following options: A. The ego vehicle is slightly steering to the right. The ego vehicle is driving fast. B. The ego vehicle is going straight. The ego vehicle is not moving. C. The ego vehicle is slightly steering to the right. The ego vehicle is driving slowly. D. The ego vehicle is slightly steering to the left. The ego vehicle is driving with normal speed. |
| GT Answer | B |
| InternVL2-2B | Based on the image, the ego vehicle is driving slowly. This can be inferred from the positioning of the vehicle in the middle of the intersection, with no immediate signs of acceleration or braking. The other options are not supported by the visual evidence in the image. |
| Mini-InternVL2-2B-DA-DriveLM | B |
| llama_adapter_v2_multimodal7b | The correct answer is option B: The ego vehicle is driving fast. |
| InternVL2-2b-LoRA-25k-drivelm | B |
| InternVL2-2b-LoRA-25k-drivelm-offline-redcircle-ctag-bkgd | B |
| InternVL2-2b-LoRA-25k_plus_DL-PL-10pct | B |
| InternVL2-2b-LoRA-300k-drivelm | C. The ego vehicle is slightly steering to the right. The ego vehicle is driving slowly. |
| llava-v1.6-mistral-7b-hf | A. The ego vehicle is slightly steering to the right. The ego vehicle is driving fast. |